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CO Abstract 

We consider a number of basic statistical and graph problems in the message-passing model, where 
we have k machines (sites), each holding a piece of data, and the machines want to jointly solve a 
J>^ problem defined on the union of the k data sets. The communication is point-to-point, and the goal 

is to minimize the total communication between the k sites. This model captures all point-to-point 
distributed computational models in terms of communication costs, including the BSP model and the 
MapReduce model. Our analysis shows that exact computation of many statistical and graph problems in 
the distributed setting are very expensive, and often one cannot do better than simply having all machines 

i 1 send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, 

C/3 one has to allow approximation, or investigate the distribution or layout of the data sets. 

Q 

1 Introduction 

£\j Recent years have witnessed a spectacular increase in the amount of data being collected and processed in 

various applications. In many of these applications, the size of the data is too large to fit on a single machine, 
and thus the data is often distributed across a group of machines, referred to as sites in this paper, which 
are connected by a communication network. These sites jointly compute a function defined on the union 
of the data sets by exchanging messages with each other. Concrete models for this type of computation 
include the BSP model [37] and the recent and extensively-studied MapReduce model [12J. Popular system 
architectures include Hadoop (T), Google's Pregel |29l , Microsoft's Trinity 0. 

Unlike traditional centralized computation, in which we are mainly concerned with the CPU process- 
ing time and the number of disk accesses, in distributed computational models for big data we are more 
interested in minimizing two objectives, namely, the communication cost and the round complexity. The 
communication cost, which we shall also refer to as the communication complexity, denotes the total num- 
ber of bits exchanged in all messages across the sites during a computation. The round complexity refers 
to the number of messages, which we shall also refer to as rounds, exchanged in order to complete the 
computation, without specifying how many bits need to be exchanged in each message. 

The communication cost is a fundamental measure since communication is often the bottleneck of ap- 
plications, and so it directly relates to overall running time, energy consumption, and network bandwidth 
usage. The round complexity is critical when the computation is partitioned into rounds and the initialization 
of each round requires a large overhead. In this paper we will focus on the communication complexity, and 
analyze problems in an abstract model called the message-passing model (see the definition in Section [TTT] ) 
that captures all models for point-to-point distributed computation in terms of their communication costs. In 
particular, our lower bound results hold even if the communication protocol sends only a single bit in each 
message, and each site has an unbounded amount of local memory and computational power. Note that this 
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means our lower bounds are as strong as possible, not requiring any assumptions on the local computational 
power of the machines. We also present several upper bounds, all of which are also locally computationally 
efficient, meaning the protocols we present do not need extra memory beyond what is required to accom- 
modate the input. We will briefly discuss the issue of round-efficiency in Section|7] 

Common sources of massive data include numerical data, e.g., IP streams and logs of queries to a search 
engine, as well as graph data, e.g., web graphs, social networks, and citation graphs. In this paper we investi- 
gate the communication costs for solving several basic statistical and graph problems in the message-passing 
model. Solving these problems is a minimal requirement of protocols seeking to solve more complicated 
functions on distributed data. 

We show that if we want to solve many of these problems exactly, then there are no better solutions than 
the almost trivial ones, which are usually quite communication-inefficient. The motivation of this work is 
thus to deliver the following message to people working on designing protocols for solving problems on 
distributed databases: for many statistical and graph problems in the distributed setting, if we want efficient 
communication protocols, then we need to consider the following relaxations to the original problem: 

1. Allow for returning an approximate solution. Here, approximation can be defined as follows: for 
a problem whose output is a single numerical value x, allowing an approximation means that the 
protocol is allowed to return any value x for which x € [(1 — e)x, (1 + e)x], for some small user- 
specified parameter e > 0. For a problem whose output is YES or NO, e.g., a problem deciding if 
a certain property of the input exists or not, we could instead allow the protocol to return YES if the 
input is close to having the property (under some problem-specific notion of closeness) and NO if 
the input is far from having that property. For example, in the graph connectivity problem, we return 
YES if the graph can be made connected by adding a small number of edges, while we return NO 
if the graph requires adding a large number of edges to be made connected. This latter notion of 
approximation coincides with the property testing paradigm [ 1 8 1 in the computer science literature. 

By allowing certain approximations we can sometimes drastically reduce the communication costs. 
Concrete examples and case studies will be given in Section [2] and Section 6.6 



2. Use well-designed input layouts. Here are two examples: (1) All edges from the same node are stored 
in the same site or on only a few sites. In our lower bounds the edges adjacent to a node are typically 
stored across many different sites. (2) Each edge is stored on a unique (or a small number) of different 
sites. Our results in Table[T]show that whether or not the input graph has edges that occur on multiple 
sites can make a huge difference in the communication costs. 

3. Explore prior distributional properties of the input dataset, e.g., if the dataset is skewed, or the under- 
lying graph is sparse or follows a power-law distribution. Instead of developing algorithms targeting 
the worst-case distributions, as those used in our lower bounds, if one is fortunate enough to have a 
reasonable model of the underlying distribution of inputs, this can considerably reduce communica- 
tion costs. An extreme example is that of a graph on n vertices - if the graph is completely random, 
meaning, each possible edge appears independently with probability p, then the k sites can simply 
compute the total number of edges m to decide whether or not the input graph is connected with high 
probability. Indeed, by results in random graph theory, if m > 0.5ln log n then the graph is connected 
with very high probability, while if m < 0.49ralogra then the graph is disconnected with very high 
probability [14]. Of course, completely random graphs are unlikely to appear in practice, though other 
distributional assumptions may also result in more tractable problems. 



1.1 The Message-Passing model 

In this paper we consider the message-passing model, studied, for example, in ||3~Tll3"8l . In this model we 
have k sites, e.g., machines, sensors, database servers, etc., which we denote as Pi, ... , P&. Each site has 
some portion of the overall data set, and the sites would like to compute a function defined on the union of 
the k data sets by exchanging messages. The communication is point-to-point, that is, if Pi talks to Pj, then 
the other k — 2 sites do not see the messages exchanged between Pi and Pj. At the end of the computation, 
at least one of the sites should report the correct answer. The goal is to minimize the total number of bits 
and messages exchanged among the k sites. For the purposes of proving impossibility results, i.e., lower 
bounds, we can allow each site to have an infinite local memory and infinite computational power; note that 
such an assumption will only make our lower bounds stronger. Further, we do not place any constraints on 
the format of messages or any ordering requirement on the communication, as long as it is point-to-point. 

The message-passing model captures all point-to-point distributed communication models in terms of 
the communication cost, including the BSP model by Valiant [37], the A41ZC MapReduce framework pro- 
posed by Karloff et al. 11251 . the generic MapReduce model by Goodrich et al. f!9l . and the Massively 
Parallel model by Koutris and Suciu |[26ll . 

1.2 Our Results 

We investigate lower bounds (impossibility results) and upper bounds (protocols) of the exact computation 
of the following basic statistical and graph problems in the message -passing model. 

1. Statistical problems: computing the number of distinct elements, known as Fq in the database lit- 
erature; and finding the element with the maximum frequency, known as the ^ or iceberg query 
problem. We note that the lower bound for (^ also applies to the heavy-hitter problem of finding all 
elements whose frequencies exceed a certain threshold, as well as many other statistical problems for 
which we have to compute the elements with the maximum frequency exactly. 

2. Graph problems: computing the degree of a vertex; testing cycle-freeness; testing connectivity; 
computing the number of connected components (#CC); testing bipartiteness; and testing triangles- 
freeness. 

For each graph problem, we study its lower bound and upper bound in two cases: with edge duplication 
among the different sites and without edge duplication. Our results are summarized in Table [T] Note that 
all lower bounds are matched by upper bounds up to some logarithmic factors. For convenience, we use 
£l(f) and 0(f) to denote functions of forms // log ^ (/) and / • log ^ (/), respectively. That is, we hide 
logarithmic factors. 

We prove most of our lower bound results via reductions from a meta-problem that we call THRESH^ . 
Its definition is given in Section [4] 



In Section 6.6 we make a conjecture on the lower bound for the diameter problem, i.e., the problem of 
computing the distance of the farthest pair of vertices in a graph. This problem is one of the few problems 
that we cannot completely characterize by the technique proposed in this paper. We further show that by 
allowing an error as small as an additive-2, we can reduce the communication cost of computing the diameter 
by roughly a ^fn factor, compared with the naive algorithm for exact computation. This further supports our 



claim that even a very slight approximation can result in a dramatic savings in communication. 
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Table 1 : All results are in terms of number of bits of communication. Our lower bounds hold for randomized 
protocols which succeed with at least a constant probability of 2/3, while all of our upper bounds are 
deterministic protocols (which always succeed), k refers to the number of sites, with a typical value ranging 
from 100 to 10000 in practice. For Fq and l^, n denotes the size of the element universe. For graph 
problems, n denotes the number of vertices and m denotes the number of edges. d v is the degree of the 
queried vertex v. We make the mild assumption that O(logn) < k < min{n, m}. Let r = min{n, m/k}. 
Except for the upper bound for cycle-freeness in the "without duplication" case, for which m > n implies 
that a cycle necessarily exists (and therefore makes the problem statement vacuous), we assume that m > n 
in order to avoid a messy and uninteresting case-by-case analysis. 



1.3 Related Work 

Quite a few graph problems have been recently studied in the MapReduce model, including computing 
the connected components lf28l[32l . counting the number of triangles IT351 I4). finding a minimum spanning 
tree [28], computing a matching and an edge cover [28), finding a minimum-cut ESI , and finding densest 
subgraphs [7]. The primary goal of all these works is to minimize the round complexity, given various 
constraints on the number of messages that each site can send/receive at each round. The algorithm in [32] 
for finding connected components requires O(logn) rounds and Cl((n + m)) bits of communication per 
round. For triangle-counting, the total amount of computation (including the local running time of each site) 
in Il35l is 0(m 3 ^ 2 ). For matching and edge cover, certain forms of approximation are needed in [28], from 
which it is shown that the total computational time is 0{m). In this line of research, an implicit assumption 
is generally made that there is no edge duplication in the input across the different sites. 

Ahn, Guha and McGregor [5,6] developed an elegant technique for sketching graphs, and showed its 
applicability to many graph problems including connectivity, bipartiteness, and minimum spanning tree. 
Each sketching step in these algorithms can be implemented in the message-passing model as follows: each 
site computes a sketch of its local graph and sends its sketch to P\. The site P\ then combines these k 
sketches into a sketch of the global graph. The final answer can be obtained based on the global sketch 
that P\ computes. Most sketches in [|5] HI are of size <9(n 1+7 ) bits (for a small constant 7 > 0), and the 
number of sketching steps varies from 1 to a constant. Thus direct implementations of these algorithms 



in the message-passing model have communication 0(k 
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bits. A good property of these sketching 



algorithms is that the sketches can be constructed and maintained in the streaming setting - the edges of the 
local input graphs can arrive on the site one at a time, without requiring the site to maintain its entire local 
input in memory. Therefore, these sketching algorithms are quite useful for processing massive graphs. 
For statistical problems, a number of approximation algorithms have been proposed recently in the 



distributed streaming model, which can be thought of as a dynamic version of the one-shot distributed 
computation model considered in this paper: the k local inputs arrive in the streaming fashion and one of 
the sites has to continuously monitor a function defined on the union of the k local inputs. All protocols 
in the distributed streaming model are also valid protocols in our one-shot computational model, while 
our impossibility results in our one-shot computational model also apply to all protocols in the distributed 
streaming model. Example functions studied in the distributed streaming model include Fq ifTTI . F2 (size of 
self join) lfTTll38l . quantile and heavy-hitters [21]. All of these problems have much lower communication 
cost if one allows an approximation of the output number x in a range [(1— e)x, (l+e)x], as mentioned above 
(the definition as to what e is for the various problems differs). These works show that if an approximation 
is allowed, then all these problems can be solved using only 0(k/e ^) bits of communication. 

On the lower bound side, it seems not much is known for graph problems. Phillips et al. OTll proved 
an Q(kn/ log 2 k) bits lower bound for connectivity. Their lower bound proof relies on a well-crafted graph 
distribution. In this paper we improve their lower bound by a factor of log k. Another difference is that their 
proof requires the input to have edge duplications, while our lower bound holds even if there are no edge 
duplications, showing that the problem is hard even if each edge occurs on a single site. Very recently in an 
unpublished manuscript, Huang et. al. ll20l showed that Q,(kn) bits of communication is necessary in order 
to even compute a constant factor approximation to the size of the maximum matching of a graph. Their 
result, however, requires that the entire matching has to be reported, and it is unknown if a similar lower 
bound applies if one is only interested in estimating the matching size. 

For statistical problems, a suite of lower bounds for approximate computations was developed in [38]. 
The problems considered in that work include the number Fq of distinct elements, the p-th frequency mo- 
ment and self-join size, the heavy-hitters problem, the quantiles and the empirical entropy. For exact Fq 
computation, the best previous communication cost lower bound was Q(Fq + k) bits. In this paper we 
improve the communication cost lower bound to Q(kFo), which is optimal up to a small logarithmic factor. 
We remark that computing Fq exactly also allows one to determine whether an item is missing from the 
union of the k data sets, a problem whose complexity has been studied in the data streaming literature, see, 
e.g., (31. 

Besides statistical and graph problems, Koutris and Suciu [26] studied evaluating conjunctive queries in 
their massively parallel model. Their lower bounds are restricted to one round of communication, and the 
message format has to be tuple-based. We stress that in our message-passing model there is no such restric- 
tion on the number of rounds and the message format; our lower bounds apply to arbitrary communication 
protocols. Recently, Daume III et al. E2ll23l and Balcan et al. (8]] studied several problems in the setting of 
distributed learning, in the message-passing model. 

1.4 Conventions 

Let [n] = {1, . . . , n}. All logarithms are base-2. All communication complexities are in terms of bits. 
We typically use capital letters X,Y, . . . for sets or random variables, and lower case letters x, y, . . . for 
specific values of the random variables X,Y, . . .. We write X ~ fi to denote a random variable chosen from 
distribution fi. For convenience we often identify a set X C [n] with its characteristic vector when there is 
no confusion, i.e., the bit vector which is 1 in the i-th bit if and only if element i occurs in the set X. 

Let H(X) denote the usual Shannon entropy of a random variable X, that is, H(X) = ^ x Pr[X = 
x] log(l/Pr[X = a;]). And let H(X \ Y) denote the conditional entropy, that is, H(X \Y) = J2 y Pr i Y = 
y]H(X \Y = y). Let Hb(p) denote the binary entropy function when p € [0,1], that is, H\,{p) = 
p\og(l/p) + (1 — p) log(l/(l — p)) with the usual convention that Hb(0) = -ffb(l) = 0. 



1.5 Roadmap 

In Section[2j we give a case study on the number of distinct elements (Fq) problem. In Section[3j we include 
background on communication complexity which is needed for understanding the rest of the paper. In 
Section|4| we introduce the meta-problem THRESH^ and study its communication complexity. In SectionB] 
and Section|6| we show how to prove lower bounds for a set of statistical and graph problems by performing 
reductions from THRESH^. We conclude the paper in Section 17] 

2 The Number of Distinct Elements: A Case Study 

In this section we give a case study on the number of distinct elements (Fq) problem, with the purpose 
of justifying the statement that approximation is often needed in order to obtain communication-efficient 
protocols in the distributed setting. 

The Fq problem requires computing the number of distinct elements of a data set. For example, given a 
table supplier containing attributes supplier-id, supplier-name and city, if we want to count the number of 
different cities in the table, we can write the following SQL statement: 

SELECT COUNT(DISTINCT city) as "Distinct Cities" 
FROM suppliers; 

This problem has numerous applications in query optimization [34]], data mining in graph databases [30], 
network traffic monitoring [15], data integration [10], data warehousing 0, and many other database and 
networking areas. 

The Fq problem has been extensively studied in both the database and theory communities in the last 
three decades, mainly in the data stream model. It began with the work of Flajolet and Martin [ 17] and 
culminated in an optimal algorithm by Kane et al. [24]. In the streaming setting, we see a stream of 
elements coming one at a time and the goal is to compute the number of distinct elements in the stream 
using as little memory as possible. In lfT6l . Flajolet et al. reported that their HyperLogLog algorithm can 
estimate cardinalities beyond 10 9 using a memory of only 1.5KB, and achieve a relative accuracy of 2%, 
compared with the 10 9 bytes of memory required if we want to compute Fq exactly. 

Similar situations happen in the distributed communication setting, where we have k sites, each holding 
a set of elements from the universe [n], and the sites want to compute the number of distinct elements of 
the union of their k data sets. In ifTTI . a (1 + e)-approximation algorithm (protocol) with 0(A:(logn + 
1/e 2 log 1/e)) bits of communication was given in the distributed streaming model, which is also a protocol 
in the message-passing model. In a typical setting, we could have e = 0.01, n = 10 9 and k = 1000, 
in which case the communication cost is about 6.6 x 10 7 bitsQ On the other hand, our result shows that 
if exact computation is required, then the communication cost among the k sites needs to be at least be 
S7(/cFo/ log k) (See Corollary u\, which is already 10 9 bits even when Fq = ra/100. 

3 Preliminaries 

In this section we introduce some background on communication complexity. We refer the reader to the 
book by Kushilevitz and Nisan [27] for a more complete treatment. 

In the basic two-party communication complexity model, we have two parties (also called sites or play- 
ers), which we denote by Alice and Bob. Alice has an input x and Bob has an input y, and they want to 

'in the comparison we neglect the constants hidden in the big-O and big-fi notation which should be small. 



jointly compute a function f(x, y) by communicating with each other according to a protocol IT. Let H(x, y) 
be the transcript of the protocol, that is, the concatenation of the sequence of messages exchanged by Alice 
and Bob, given the inputs x and y. In this paper when there is no confusion, we abuse notation by using II 
for both a protocol and its transcript, and we further abbreviate the transcript II(x, y) by II. 

The deterministic communication complexity of a deterministic protocol is defined to be 
max{|n(x, y)\ | all possible inputs (x, y)}, where \IL(x, y)\ is the number of bits in the transcript of the 
protocol II on inputs x and y. The randomized communication complexity of a randomized protocol II is 
the maximum number of bits in the transcript of the protocol over all possible inputs x, y, together with all 
possible random tapes of the players. We say a randomized protocol II computes a function / correctly with 
error probability 5 if for all input pairs (x, y), it holds that Pr[II(x, y) ^ f(x, y)] < 5, where the probability 
is taken only over the random tapes of the players. The randomized 5-error communication complexity of 
a function /, denoted by R s (f), is the minimum communication complexity of a protocol that computes / 
with error probability at most 5. 

Let /i be a distribution over the input domain, and let (X, Y) ~ /i. For a deterministic protocol II, we 
say that II computes / with error probability 5 on \i if Pr[II(X, Y) ^ f{X, Y)] < 5, where the probability 
is over the choices of (X, Y) ~ fi. The 5-error p-distributional communication complexity of /, denoted by 
D s (f), is the minimum communication complexity of a deterministic protocol that computes / with error 
probability 5 on fi. We denote E[D S (f)] to be the expected distributional communication complexity where 
the expectation is taken over the input distribution p. 

We can generalize the two-party communication complexity to the multi-party setting, which is the 
message -passing model considered in this paper. Here we have k players (also called sites) Pi , . . . , P& with 
Pj having the input Xj, and the players want to compute a function f(x\, . . . , xu) of their joint inputs by 
exchanging messages with each other. The transcript of a protocol always specifies which player speaks 
next. In this paper the communication is point-to-point, that is, if Pi talks to Pj, the other players do not 
see the messages sent from Pi to Pj. At the end of the communication, only one player needs to output the 
answer. 

The following lemma shows that randomized communication complexity is lower bounded by distribu- 
tional communication complexity under any distribution p,. 

Lemma 1 (one direction of Yao's Lemma (39j) For any function f and any 5 > 0, 

R 5 (f)>m a xDl(f). 

Proof: The original proof is for two players, though this also holds for k > 2 players since for any distri- 
bution fi, if II is a 5-error protocol then for all possible inputs x 1 , . . . , x k to the k players, 

r r ran dom tapes of the players [A J- \% , ■ ■ ■ ,X ) — J (X , . . . ,X )\ ^ 1 O, 

which implies for any distribution \x on (x 1 , . . . , x k ) that 

"'random tapes of the players^x 1 ,...^* 1 )^^!^^^ > ■ • • i x ) J (■*" ) • • ■ > "^ )\ — "' 

which implies there is a fixing of the random tapes of the players so that 

Pr {x i_ xk) ^[U(x\...,x k ) = f(x\...,x k )]>l-5, 
which implies D' 1(f) is at most R s (f). ■ 



Therefore, one way to prove a lower bound on the randomized communication complexity of / is to first 
pick a (hard) input distribution /j, for /, and then study its distributional communication complexity under 

Note that given a 1/3-error randomized protocol for a problem / whose output is or 1, we can always 
run the protocol C \og{l/5) times using independent randomness each time, and then output the majority of 
the outcomes. By a standard Chernoff bound (see below), the output will be correct with error probability 
at most e _KC ' lo g( 1 /' 5 ) for an absolute constant k, which is at most 5 if we choose C to be a sufficiently large 
constant. Therefore R^ 3 (f) = n{R s (f)/log(l/S)) = ft(max M £>*(/)/ log(l/<J)) for any 5 G (0,1/3]. 
Consequently, to prove a lower bound on R 1 ' 3 (f) we only need to prove a lower bound on the distributional 
communication complexity of / with an error probability 5 < 1/3. 

Chernoff bound. Let X±, . . . ,X n be independent Bernoulli random variables such that PrpQ = 1] = pi. 

Let X = Y.ie[n] X i- Let /" = E i X ]- Tt holds that Pr i X > (! + <*)/*] < e~ s2 ^ 3 and Pr[X < (1 - 5)/j] < 
e -5V2 for any s G (o, 1). 

4 A Meta-Problem 

In this section we discuss a meta-problem THRESH^ and we derive a communication lower bound for it. 
This meta-problem will be used to derive lower bounds for statistical and graph problems in our applications. 

In the THRESH^ problem, site Pi (i € [k]) holds an r-bit vector %i = {x* i, • • • , £j,r}, and the k sites 
want to compute 

THRESH 1 -^! Xl )-f 0, if ^je[r](V t eik]X l .j)<0, 

THRESH^,...,*,) _j x ifE ieM (V <eW s W )>* + l. 

That is, if we think of the input as a k x r matrix with x\ , . . . , x^ as the rows, then in the THRESH^ problem 
we want to find out whether the number of columns that contain a 1 is more than for a threshold parameter 



We will show a lower bound for THRESH^ using the symmetrization technique introduced in [31 ]. First, 
it will be convenient for us to study the problem in the coordinator model. 

The Coordinator Model. In this model we have an additional site called the coordinator q which has no 
input (formally, his input is the empty set). We require that the k sites can only talk to the coordinator. The 
message -passing model can be simulated by the coordinator model since every time a site Pi wants to talk 
to Pj , it can first send the message to the coordinator, and then the coordinator can forward the message to 
Pj. Such a re-routing only increases the communication complexity by a factor of 2 and thus will not affect 
the asymptotic communication complexity. 

Let f : X x y — )>{0,l}bean arbitrary function. Let /x be a probability distribution over X x y. Let 
/q R : X k x y — > {0, 1} be the problem of computing f(x\,y) V f(x2,y) V. . . V f(xk,y) in the coordinator 
model, where Pi has input Xi G X for each i G [k] , and the coordinator has y £ y. Given the distribution 
jjl on X x y, we construct a corresponding distribution v on X k x y-. We first pick (X\,Y) ~ fi, and then 
pick X2, ■ ■ ■ , X^ from the conditional distribution fi \ Y. 



We can also choose, for example, Pi to be the coordinator and avoid the need for an additional site, though having an additional 
site makes the notation cleaner. 



The following theorem was originally proposed in OTTl . Here we improve it by a log A; factor by a 
slightly modified analysis, which we include here for completeness. 

Theorem 1 For any function f : X xy — >• {0, 1} and any distribution [ion X x y for which //(/ _1 (1)) < 
l/k 2 ,wehaveDl /k \f* R ) = n(k-E[D 1 J( 1O0k) (/)]). 

Proof: Suppose Alice has X and Bob has Y with (X,Y) ~ fi, and they want to compute f(X,Y). 
They can use a protocol V for /q r to compute /(X, F) as follows. The first step is an input reduction. 
Alice and Bob first pick a random / G [k] using shared randomness, which will later be fixed by the 
protocol to make it deterministic. Alice simulates Pj by assigning it an input Xj = X. Bob simulates 
the coordinator and the remaining k — 1 players. He first assigns Y to the coordinator, and then samples 
X\, . . . , X/_i, -Xj+i, . . . , Xfc independently according to the conditional distribution p, \ Y, and assigns 
Xi to Pi for each i G [fc]\J. Now {Xi, ...,X k ,Y} ~ i/. Since fj,(f~ x (l)) < l/k 2 , with probability 
(1 - l//c 2 ) fe - 1 > 1 - l/k, we have /(X i; Y) = for all » G [jfc]\J. Consequently, 

/g R (Xi,...,x fc) y) = /(x,y). CD 

We say such an input reduction is good. 

Alice and Bob construct a protocol V 1 for / by independently repeating the input reduction twice, and 
running V on each input reduction. The probability that at least one of the two input reductions is good is 
at least 1 — l/k 2 , and Bob can learn which reduction is good without any communication. This is because 
in the simulation he locally generates all Xi (i G [k]\I) together with Y. On the other hand, by a union 
bound, the probability that V is correct for both two input reductions is at least 1 — 2/k 2 . Note that if we 
can compute /q r (Xi, . . . , X k , Y) correctly for a good input reduction, then by (111), V can also be used 
to correctly compute f(X,Y). Therefore V can be used to compute f(X,Y) with probability at least 
1 - 2/k 2 - l/k 2 > 1 - l/(100fc). 

Since in each input reduction, X\, . . . , X k are independent and identically distributed, and since I G [k] 
is chosen randomly in the two input reductions, we have that in expectation over the choice of /, the com- 
munication between Pj and the coordinator is at most a 2/k fraction of the expected total communication 
of V given inputs drawn from v. By linearity of expectation, if the expected communication cost of V for 
solving /q R under input distribution v with error probability at most l/k 2 is C, then the expected com- 
munication cost of V' for solving / under input distribution fx with error probability at most l/(100fc) is 
0(C/k). Finally, by averaging there exists a fixed choice of / G [k], so that V is deterministic and for 
which the expected communication cost of V' for solving / under input distribution fj, with error probability 
at most 1/(100*;) is 0(C/k). Therefore we have Dl /k \fg R ) = fi(jfe • E[d}/ ' (100fc) (/)]) . ■ 

4.1 The 2-DISJ r Problem 

Now we choose a concrete function / to be the set-disjointness problem. In this problem we have two 
parties: Alice has x C [r] while Bob has yC [r], and the parties want to compute 

2-DISr(x,y) = { 1 > X^y^ti, 
v ; \ 0, otherwise. 

Set-disjointness is a classical problem used in proving communication lower bounds. For example, in a 
database context it was recently used in proving streaming lower bounds for finding dense subgraphs Q. 
We define an input distribution rg for 2-DISJ r as follows. Let t = (r + l)/4. With probability /?, x and y 



are random subsets of [r] such that \x\ = \y\ = £ and \x D y\ = 1, while with probability 1 — /?, x and y 
are random subsets of [r] such that \x\ = \y\ = £ and x n y = 0. Razborov ll33l proved that for /3 = 1/4, 
Dt 1/4 (2-DISJ r ) = fi(r), and one can extend his arguments to any j3 G (0, 1/4], and to the expected 
distributional communication complexity where the expectation is take over the input distribution. 

Theorem 2 (EQ, Lemma 2.2) For any /3 G (0, 1/4], it holds that E[D^ 100 (2-DISf')} = O(r), where the 
expectation is taken over the input distribution. 

4.2 The OR-DISJ r Problem 

If we choose / to be 2-DISJ r and let [i = rp, then we call /q R in the coordinator model the OR-DISJ r 
Problem. By Theorem [T] and Theorem [2] We have 

Theorem 3 D l J k ' {OR-D1SF) = O(fcr). 

4.2. 1 The Complexity of THRESH^ 

We prove our lower bound for the setting of the parameter 6 = (3r — l)/4. We define the following input 
distribution £ for THRESHL _ 1 w 4 : We choose {X\, . . . , Xk, Y} ~ v where v is the input distribution for 
OR-DISF, and then simply use {X-y, ..., X k } as the input for THRESH^ 



Lemma 2 Under the distribution C,, assuming k > cj~ log r for a large enough constant c k , we have that 



Vieffcl Xij = 1 far all j G [r]\Y with probability 1 — 1/k 10 . 



Proof: For each j G [r]\F, we have Vieffcl ^ij = 1 with probability at least 1 — (1 — l/4) fc . This is 
because PrpQj = 1] > 1/4 for each j G [r] \Y, by our choices of X{. By a union bound, with probability 
at least 

(l-(3/4) fc -|[r]Vn 
= (l - (3/4) fc • (3r - l)/4) 
> 1-1/fc 10 

(by our assumption c^logr < k < r for a large enough constant c&), we have Vieffcl -^*"j = 1 ^ or a ^ 

je[r]\Y. •'-".' ^ 

Theorem 4 DJ (THRESFF,.^,^,^) = £l(kr), assuming c k logr < k < rfora large enough constant Ck- 

Proof: By Lemma|2l it is easy to see that any protocol V that computes THRESHL _ 1 w 4 on input distri- 
bution C correctly with error probability 1/k 3 can be used to compute OR-DISJ r on distribution v correctly 
with error probability 1/k 3 + 1/k 10 < 1/k 2 , since if (Xi, . . . , Xk, Y) ~ v, then with probability 1 — 1/k 10 , 
we have 

OR-DISr (X u ..., X k , Y) = (3j G Y, 2-DISJ r (X i , Y) = 1) = THRESH^_ 1)/4 (Xi, ...,X k ). 

The theorem follows from Theorem [3] ■ 
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5 Statistical Problems 

For technical convenience in the reductions, we make the mild assumption that Ck log n < k < n where 
Cfc is some large enough constant. For convenience, we will repeatedly ignore an additive 0(l/k 10 ) error 
probability introduced in the reductions, since these will not affect the correctness of the reductions, and can 
be added to the overall error probability by a union bound. 

5.1 F Q (#distinct-elements) 

Recall that in the Fq problem, each site Pi has a set Si C [n] , and the k sites want to compute the number of 
distinct elements in IJieFfel &• 

For the lower bound, we reduce from THRESH! 1 ^ _ 1 w 4 - Given an input {X\, . . . ,Xf.} ~ £ for 
THRESH^ _ 1 y 4 , each site sets Si = X{. Let crp be the input distribution of Fq after this reduction. 

By Lemma we know that under distribution (, with probability 1 — 1/k 10 , for all j G [n]\Y (recall 
that Y is the random subset of [n] of size (n + l)/4 we used to construct X\, . . . , X^ in distribution Q, 
Vieffcl Xi,j = 1. Conditioned on this event, we have 

THRESH" 3n _ 1)/4 (Xi, . . . , X k ) = 1 
<«=► F (U ie[fc] Si)>(3n-l)/4. 

Therefore, by Theorem UJ we have that Dj F (Fq) = Q(kn). Note that in this reduction, we have to choose 

n = Q(Fq), Therefore, it makes more sense to write the lower bound as Dj F (Fq) = S7(A;Fo). 
The following corollary follows from Yao's Lemma (Lemma[T]) and the discussion following it. 

Corollary 1 R 1 * 3 (Fo) = ^(^Fo/ log k), assuming c& log Fq < k < Fofor a large enough constant c/%. 

Similar corollaries hold for all other problems in Section [5] and Section [6j using Yao's Lemma, and we will 
not explicitly state such corollaries. 

An almost matching upper bound of 0(k(FQ logFo + logn)) can be obtained as follows: the k sites 
first compute a 2-approximation Fq to Fq using the protocol in 1 1 1] (see Section|2]), which costs 0(k log n) 
bits. Next, they hash every element to a universe of size (-Fq) 3 , so that there are no collisions among hashed 
elements with probability at least 1 — I/Fq, by a union bound. Finally, all sites send their distinct elements 
(after hashing) to Pi and then Pi computes the number of distinct elements over the union of the k sets 
locally. This step costs 0(kFQ logFo) bits of communication. 

5.2 e^ (MAX) 

In the loo problem, each site Pi has a set Si C [n], and the k sites want to find an element in Uieffcl ^* w i tn 
the maximum frequency. 

For the lower bound, we again reduce from THRESH!^ _ 1 w 4 - Given an input {X\, . . . , X^} ~ £ for 
THRESH/g _ 1 y 4 , the k sites create an input {S±, . . . , S^} as follows: first, Pi chooses a set R C [k] by 
independently including each i £ [k] with probability 7/8, and informs all sites Pj (i € R) by sending each 
of them a bit. This step costs O(k) bits of communication. Next, for each i G R, Pi flips Xy for each 
j G [n]. Finally, each Pj includes j G Si if Xij = 1 after the flip and j G" Si if Xij = 0. Let o~l be the 
input distribution of l^ after this reduction. 

They repeat this input reduction independently T times where T = ct log k for a large constant or, and 
at each time they run £ 00 (Uj e [fc]S'j). Let Pi, ... , R? be the random set R sampled by Pi in the T runs, and 
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let 0\ , . . . , Ot be the outputs of the T runs. They return THRESH™^ ^ /4 (X\ , . . . , X& ) = 1 if there exists 
a t € [T] such that Ot > \ Rt | + 1 and otherwise. 

We focus on a particular input reduction. We view an input for THRESH? 3nl w 4 as a k x n matrix. 
The i-th row of the matrix is Xj. After the bit-flip operations, for each column j € M\Y, we have for each 
i e [k] that 

Pr[Xij = 1] 

< 7/8 - fl ~ ( " + 1)/4 ~ X V V8 • /" + 'Yf, 

~ ' V (3n-l)/4 J ' (3n-l)/4 

< 3/4. 

By a Chernoff bound, for each j G [h]\Y, Xwe[jfei -^-i < 13&/16 with probability 1 — e~ n ( k \ Therefore 
with probability at least (1 — e~ n ( k > ■ n) > (1 — 1/k 10 ) (assuming that c k log n < k < n for a large enough 
constant c k ), X^effcl X i,j < 13A/16 holds for all j G [h]\Y\ 

Now we consider columns in 1". We can show again by Chernoff bound that \R\ > 13&/16 with 
probability (1 — 1/k 10 ) for all columns in Y, since each i £ [k] is included into R with probability 7/8, and 
before the flips, the probability that Xij = 1 for an i when j € Y is negligible. Therefore with probability 
(1 — 1/fc 10 ), the column with the maximum number of Is is in the set Y, which we condition on in the rest 
of the analysis. 

In the case when THRESH?^ _ 1 w 4 (Xi, . . . , X^) = 1, then with probability at least 1/8, there exists 
a column j € Y and a row i G [k]\R for which Xij = 1. If this happens, then for this j we have 
Eig[fc]-Xij > | J R|+l,orequivalently,£ 00 (U ie[fe] 5 i ) > |i?|+l. Otherwise, if THRESH^ 3n _ 1)/4 (X h ...,X k ) 
0, then J2ie[k] X hj = \ R \ for a11 3 G Y - Therefore, if THRESH™ n _ 1)/4 (Xi, ...,X k ) = l, then the proba- 
bility that there exists at £ [T] such that O t > \Rt\ + 1 is at least 1 - (1 - 1/8) T > 1 - 1/k 10 (by choosing 
c T large enough). Otherwise, if THRESH n 3n _ 1)/4 (Xi, ...,X k ) = 0, then O t = \R t \ for all t£ [T], 

Since our reduction only uses T • O(k) = O(klogk) extra bits of communication and introduces an 
extra error of 0(l/k w ), which will not affect the correctness of the reduction. By Theoremffl we have that 

Dj L {loo) = £l(kn). Note that in the reduction, we have to assume that 0(£oo) = @(k). In other words, if 
<C k then we have to choose k' = 0(£oo) sites out of the k sites to perform the reduction. Therefore it 



c oo 



1/k 



,3 



makes sense to write the lower bound as DJ L (£oo) = ^i{m.in{£ 00 , k}n). 

Finally, a simple protocol in which all sites send their elements-counts to the first site solves £oo with 
0(min{&;, i^n log n) bits of communication, which is almost optimal in light of our lower bound above. 

6 Graph Problems 

In this section we consider graph problems. Let G = (V, E) with \V\ = n and \E\ = m be an undirected 
graph. Each site has a subgraph Gi C G, and the k sites want to compute a property of G via a commu- 
nication protocol. For technical convenience we again assume that c k logra < k < min{n, m}, where c k 
is a large enough constant. Except for the upper bound for cycle-freeness in the without edge duplication 
case, for which m > n always causes the graph to not be cycle-free, we assume that m > n to avoid an 
uninteresting case-by-case analysis. 

Most lower bounds in this section are shown by reductions from THRESHL _ x w 4 for some value r < 
n 2 . For convenience of presentation, during some reductions we may generate graphs with more than n 



12 



vertices. This will not affect the order of the lower bounds as long as the number of vertices is 0(n) and the 
number of edges is 0(m) (if m appears in the lower bound as a parameter). 

The following procedure will be used several times in our reductions. Thus, we present it separately. 

Reconstructing Y from X±, X2, . ■ . , X^. Given an input {X\, . . . , X^} ~ C f° r THRESHL _ x w 4 to 
the k sites, the first site Pi can construct Y correctly with probability 1 — 0(l/fc 10 ), using 0(r log r) bits 
of communication, assuming that c^ logr < k < r for a large enough constant c&. We view the input as a 
k xr matrix with the k sites' inputs as rows. For each column j G [r], P\ randomly selects cy log r sites for 
some large enough constant cy, asks each of them for the j'-th bit of their input vectors, and then computes 
the sum of these bits, denoted by Sj. Note that if j G [?"]\K, then by a Chernoff bound with probability 
1 — g-K-cyiogr ^ K j s an absolute constant), we have that Sj > cylogr/2. Therefore with probability 
1 - e -«-cyiogr . r > 1 _ i/^io^ f or a r[ j e [ r ]\Y, it holds that Sj > c Y log r/2. On the other hand, again by 
a Chernoff bound we have that with probability 1 — 1/fc 10 , for all j G Y, Sj < cy log r/2. Therefore Pi can 
reconstruct Y correctly with probability 1 — 0(l/fc 10 ). Since the 0(r logr) extra bits of communication 
and the 0(l//e 10 ) additional error probability will not affect the correctness of any of our reductions below, 
we can simply assume that P\ can always reconstruct Y for free. 

6.1 Degree 

In the degree problem, give a vertex v E V, the k sites want to compute the degree of v. 

If edge duplication is not allowed, then the degree problem can be solved in 0(k log n) bits of commu- 
nication: each site sends the number of edges containing the query vertex to the first site P\ and then P\ 
adds up these k numbers. A lower bound of fi(fc) bits also holds since each site has to speak at least once in 
our communication model. 

When we allow edge duplication, the degree problem is essentially the same as that of Fq, by the 
following reduction. Given an input {X\, . . . , X^} ~ tf for Fq, the k sites construct a graph G = (V, E) 
where V = {v\, . . . , v n } of size n. Each site Pi (i G [A;]) does the following: for each element j G Xi and 
j / 1, it creates an edge (v±,Vj). Let G be the resulting graph. Then 

F (X 1 ,...,X k ) = DegKc{v 1 )-l. 

Thus, we obtain a lower bound of Q(kd v ) bits for the degree problem, where d v is the degree of the queried 
vertex v. An 0(kd v logn) bit upper bound is the following: each site sends all the neighbouring vertices of 
v to the first site. 

6.2 Cycle-freeness 

In the cycle-freeness problem, the k sites want to check whether G contains a cycle. 

6.2.1 Without Edge Duplication 

If edge duplication is not allowed, then we have the following simple protocol: P2 , . . . , P& send the number 
of their local edges to Pi and Pi computes the total number of edges in the graph G, denoted by m. If 
m > n then Pi determines immediately that G contains a cycle, since every graph on n vertices having at 
least n edges must contain a cycle. Otherwise if m < n, then P2, . . . , P& send all their edges to Pi, who then 
does a local check. The communication cost of this protocol never exceeds 0(fclogn+min{m,n} logn) = 
0(min{m, n} log n) bits. 

13 



Let h = min{m, n}. An fl(h) bit lower bound holds even when k = 2, by a reduction from the 2-DISJ ft 
problem: suppose Pi has X and P2 has Y, where (X, Y) ~ t\u is the hard input distribution for 2-DISJ . 
P\ and P2 construct a graph G on the vertex set {s, t, v±, . . . , Vh} as follows: for each i £ X, Pi creates 
an edge (s, vi), and he/she also creates an additional edge (s, t). Similarly, for each i 6 Y, P2 creates an 
edge (y% , t) . Let ac-i be the resulting input distribution of G. It is easy to see from the reduction that if 
X n Y 7^ 0, then there is a cycle in the form of s — >• £ — >• u* — >• s for some i G [/i]. Otherwise the graph is a 
forest. Therefore, 

2-DlSJ h (X,Y) = 1 ^^ G contains a cycle. 

Therefore by Theorem 2 it follows that Dj c (cycle-freeness) = Vt{h). 

6.2.2 With Edge Duplication 

For the lower bound, we reduce from THRESH/ l 3n _ 1 w 4 . For each j G [n], G contains a vertex Vj. G also 
contains a special vertex u. The total number of vertices in G is n + 1. 

Given an input {X\, . . . , X^} ~ Q for THRESH? 3n _ 1 w 4 , the k sites create a graph G for cycle-freeness 
as follows. Each Pi creates an edge (u,Vj) for each X^j = 1. In addition, Pi reconstructs Y, picks 
an arbitrary set of Y C [ti]\Y" of size (n + l)/4, and creates an arbitrary perfect matching between Y 
and Y. Let ac 2 be the resulting input distribution of G. By Lemma pi we have with probability (1 — 
1/k 10 ) that all pairs (u,Vj) for j £ [n]\Y are connected. It is easy to see from the reduction that if 
THRESHp 3nl w 4 (Xi, . . . , Xk) = 1, then there is a cycle of the form s — > v i — > Vj ■— > s for some i E [r&]\Y 
and j G Y. Otherwise the graph is a forest. Therefore, 

THRESHj? 3n _ 1)/4 (Xi, . . . , X k ) = <f^ G is cycle-free. 

Thus by Theorem 4 we have that Dj c (cycle-freeness) = ^l(kn). 

There is again a trivial O (kn log n) upper bound. Each site first checks its local graph and reports 
directly if a cycle is found, otherwise the site sends all its edges (there are no more than n — 1 such edges 
for a cycle-free graph) to Pi. Finally Pi checks the cycle-freeness on the union of those edges. 

6.3 Connectivity and #CC 

In the connectivity problem, the k sites want to check whether G is connected. In the #connected-components 
(#CC) problem, the k sites want to compute the number of connected components in G. Note that solving 
#CC also solves connectivity, thus we only show the lower bound for connectivity. 

6.3.1 Without Edge Duplication 

For the lower bound, we reduce from THRESH[ 3rl w 4 where r = min{m/fc, n}. For each i e [k], G 
contains a vertex m; and for each j € [r], G contains a vertex Vj. The total number of vertices in G is 
n + k < 2n. Given an input {X±, . . . ,X^} ~ £ for THRESHL^ /4 , the k sites create a graph G for 
connectivity as follows. Each Pj creates an edge (m, Vj) for each Xij = 1. In addition, Pi reconstructs Y, 
and then creates a path containing {vj \ j G Y} and a path containing {vj \ j G |V]\y}. See Figure [Tlfor an 
illustration. Let a^ be the resulting input distribution of G. It is easy to see from the reduction that 

THRESH[ 3r _ 1)/4 (Xi, ...,X k ) = l <^> G is connected. 
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(ui,«j) exists if and only if Xij = 1 
Figure 1 : Graph G in the reduction for connectivity. 



Thus by Theorem 4 we have that Dj N (connectivity) = Q,(kr). 

For the upper bound, all connected components (thus #CC and connectivity) can be found by the protocol 
in which all sites send their local spanning trees to the first site Pi and then Pi does a local computation, 
which costs 0(kr log n) bits of communication. 



6.3.2 With Edge Duplication 

For the lower bound, we use a slightly modified reduction of the one for the without edge duplication case. 
We reduce from THRESH? 3n ^ /4 . For each j G [n],G contains a vertex Vj. G also contains a special vertex 

/~i ;„ „ i 1 r-<: — : <- ( v v -\ /- t TrrriT^CTjrn 

(3n-l)/4> 



The total number of vertices in G is n + 1. Given an input {X±, . . . , X/-} ~ £ for THRESH 



the k sites create a graph G for connectivity as follows. Each Pj creates an edge (u, Vj) for each X{j = 1. 
In addition, Pi reconstructs Y, and then creates a path containing {vj \ j £ Y} and a path containing 
{vj I j e [n]\y}. The total number of edges is 0(n) = 0(m). This graph can be seen as the graph 
we constructed for the without edge duplication case after merging u\ , . . . , Uk to a single vertex u while 
maintaining all the adjacent edges. Let a^ 2 be the resulting input distribution of G. The correctness of the 
reduction is the same as before, that is, THRESH!^ x y 4 (Xi , . . . , X^) = 1 if and only if G is connected. 

Thus by Theorem 4 we have that Dj N (connectivity) = Q(kn). 

The upper bound is the same as the without edge duplication case, and the cost is 0(kn log n) bits 
(note that since we allow edge duplication here, the total number of edges of the k spanning trees cannot be 
bounded by 0(m)). 



6.4 Bipartiteness 

In the bipartiteness problem, the k sites want to check whether G is bipartite. 



6.4.1 Without Edge Duplication 



For the lower Bound, we reduce from THRESHL _ 1 w 4 where r 



mm{m/k, n}. Given an input {X\, . . . , X^} 
(forTHRESHL _ 1)/4 , the k sites create a graph G = (V, E) with V = AUBUC where A = {ai,...,a r }, 
B = {61, . . . , b r } and C = {c\, . . . , c/J, as follows: each Pj creates an edge (q, bj) for each Xij = 1. In 
addition, Pi does the following: 
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1. Creates an edge (aj, bi) for each i G [r]. 

2. Reconstructs Y. For each i G [A;] and j E Y, creates an edge (cj, oj). 

The total number of vertices of G is 2r + fe < 3n. Let cr^ be the resulting input distribution of G. One can 
see from the reduction that if THRESHL , _ 1 w 4 (Xi, . . . , Xjt) = 1, then there exists at least one triangle in 
the form of (aj, bi, Cj) for some i G [r],j £ [&]. Otherwise if 

THRESH[ 3r _ 1)/4 (Xi,...,X fc ) = 0, 

then all edges are between two vertex sets {oj | i E [r]\Y}uGU{6j | i E Y}and{6j | i E [r]\Y}U{aj | i E 
Y} and consequently G is a bipartite graph. Therefore, 

THRESH[ 3r _ 1)/4 (Xi, . . . , X k ) = ^^ G is bipartite. 

Thus by Theorem 4 we have that D UB (bipartiteness) = Q(kr). 

For the upper bound, we can assume that the graph is connected, since otherwise we can first compute 



all connected components (which costs O(knlogn) bits of communication as mentioned in Section 6.3 1, 
and then work on each connected component. The protocol works as follows: the first site Pi chooses an 
arbitrary vertex u in the graph, and grows a breadth-frrst-search (BFS) tree rooted at u by communicating 
with the other k — 1 sites. In the first round, P\ asks each site to report the vertices adjacent to u using 
its local edges. The communication is at most 0(\N(u)\ logn ■ k) bits, where N(u) denotes the set of 
neighbors of u, and |iV(u)| denotes the number of (distinct) neighbors of u. From this, P\ computes the 
entire set N(u) of neighbors of u, without duplication, and sends it to the other k — 1 sites. This also takes 
0(\N(u)\ logn • k) bits of communication. Now the sites all know N(u), and they can build the first layer 
of the BFS tree rooted at u. Next, Pi picks the first child v (according to an arbitrary but fixed order) of 
u, and repeats this process on v. If ever a site finds an odd cycle, it is announced to Pi. Notice that every 
vertex is sent at most k times to Pi, meanwhile the total number of vertices sent is no more than 0(m), so 
the total communication is 0(kr log n) bits. 

6.4.2 With Edge Duplication 

We reduce from THRESH% n _ 1 y 4 . The reduction is a simple modification of the one for the without edge 
duplication case. Given an input {X\, . . . ,Xk} ~ ( for THRESHp 3nl y 4 , the k sites create a graph 
G = (V,E) with V = AU B U G where A = {ai, . . . ,a n }, B = \b\, ..., b n } and G = {c}, as 
follows: each Pi creates an edge (c, bj) for each Xij = 1. In addition, Pi creates an edge (a^, bi) for each 
i G [n], reconstructs Y, and for each j G Y creates an edge (c, a,j). The total number of vertices of G is 
2n + 1. Let ob 2 De the resulting input distribution of G. The correctness of the reduction is similar as before, 
that is, THRESH/3 n _ 1 i / 4 (Xi, . . . , Xk) = if and only if G is bipartite. Thus by Theorempfl we have that 

Dj B (bipartiteness) = Q(kn). 

The upper bound is the same as the without edge duplication case, and the communication complexity is 
0{kn log n) bits. Note that since we have edge duplication here, the claim that "the total number of vertices 
sent is no more than 0(m)" does not hold. 

6.5 Triangle-freeness 

In the triangle-freeness problem, the k sites want to check whether G contains a triangle. 
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(tip, Cg) exists if and only if i G X where i = (p— l)n + q. 
(b p . Cq) exists if and only if i 6 Y where i — (p— l)n + q. 

Figure 2: Graph G in the reduction for triangle-freeness. 



6.5.1 Without Edge Duplication 

An 0(m log n) upper bound is the following: P2, . . . , P& send all their edges to Pi and then Pi does a local 
check. 

There is an £l(m) bit lower bound on the communication which holds even when k = 2, by a reduction 
from 2-DISJ™. Suppose Pi holds X and P2 holds Y, where {X, Y} ~ t x / 4 is the hard input distribution 
for 2-DISJ" 1 . Sites Pi and P 2 construct a graph G = (V, E) with F = A U P U G where A = {01, . . . , a n }, 
B = {b\, . . . , b n } and C = {a, . . . , c n } as follows: for each i € X, P\ creates an edge (a p , c q ) such that 
(p — l)n + q = i (p, q £ [n]) (note that the solution of (p, q) is unique). He/she also creates an edge (at, b t ) 
for all t G [n]. Similarly, for each i G Y, P2 creates an edge (6 P , c q ) such that (p — l)n + q = i (p,q <E [n]). 
The graph G has 3n vertices and 0(m) edges. See Figure|2]for an illustration. Let ot x be the resulting input 
distribution of G. One can see from the reduction that if X n Y 7^ 0, then there is a triangle in the form of 
(a p , b p , c q ) for some p,q G [n]. Otherwise the graph is triangle-free. Therefore, 

2-DISJ m (X,Y) = <^=>- G is triangle free. 
Therefore by Theorem 2 we have Dj T (triangle-freeness) = U(rn). 

6.5.2 With Edge Duplication 

We reduce from THRESH^ ml w 4 , and prove an Q(km) lower bound on the communication cost. The 
reduction is an extension of the one for the without edge duplication case. 

Given an input {Xi, . . . , X^} ~ C, for THRESH^ ml y 4 , the k sites create the following input graph 
G = (V, E) for triangle-freeness with V = AU B U C where A = {ai, . . . , a n }, B = {b\, . . . , b n } and 
G = {c\, . . . , c n }. Each site Pj does the following: for each j € [m] such that Xij = 1, the site creates an 
edge (a p , c q ) such that (p — l)n + q = j (p,q G [n]). In addition, the first site Pi also does the following. 

1. Creates an edge (a t ,b t ) for each t G [n]. 

2. Reconstructs Y. For each j G Y, creates an edge (b p , c q ) such that (p — \)n + q = j (p,q G [n]). 



17 



Let o"t 2 be the resulting input distribution of G. As before, it is easy to see that if THRESH!^ m _ 1 y i (-X'i, • • • , -Xjfe) 
1, then there is a triangle of the form (a p , b p , c q ) for some p,q € [n]. Otherwise the graph is triangle-free. 
Therefore, 

THRESH™ m _ 1)/4 (X!, ...,X k )=0 ^ G is triangle-free. 

By Theorem 4 we have that Dj T (triangle-freeness) = VL{km). 

There is a simple protocol with 0(km log n) bits of communication: each site sends all its edges to P\ 
and the P\ does a local check. 

We comment that our upper and lower bounds also applies to testing clique-freeness, that is, the k sites 
want to check whether G contains a clique of size s for a fixed constant s. 

6.6 A Conjecture on the Diameter Problem and an Approximation Algorithm 

We would like to mention the diameter problem which cannot be solved by the technique introduced in this 
paper. In the diameter problem, the k sites want to compute the diameter of a graph G = (V, E) in which 
the edges are distributed amongst the k sites. We conjecture the following: 

Conjecture 1 The randomized communication complexity of the diameter problem in the message-passing 
model is Vt(km) bits, assuming edge duplication is allowed. 

Note that the naive algorithm in which every site sends all of its edges to the first site will match this 
lower bound up to a logarithmic factor. 

In lfl3l an algorithm for constructing a graph spanner in the RAM model with an additive distortion 
2 is proposed. A graph spanner with an additive distortion d preserves all pairwise distances of vertices 
in the original graph up to an additive error d. This algorithm can be easily implemented in the message- 
passing model with a communication complexity of 0{kr?l 2 log 2 n) bits. This fact gives us a hint that in 
order to prove an £l(kn 2 ) (in the case when m = ©(n 2 )) lower bound, we have to explore the difficulty of 
distinguishing a diameter T verses T + 1 for a value T G [1, n — 2] (in fact, one can show that T also needs to 
be a fixed constant). Now we briefly describe how to implement the algorithm in [ 13] in the message-passing 
model. 

The algorithm in ifTBl works as follows (in the RAM model). 



1. We pick Q{^/n\ogn) vertices in the graph uniformly at random, and grow a breadth- first-search 
(BFS) tree rooted on each of these vertices. We then include all edges of these BFS trees into the 
spanner. 



2. We include all edges incident to vertices whose degrees are no more than \fn. 

In |[T3l it was shown that this algorithm computes a spanner with an additive distortion 2 correctly with 
probability 0.99. Now let us briefly discuss how to implement this algorithm in the message-passing model. 
For the first step, the random sampling can be done by Pi locally, and then Pi communicates with the other 



fc — 1 sites to grow a BFS tree rooted on each of these vertices using the algorithm described in Section 6.4 
Recall that constructing each of these BFS trees costs 0(kn log n) bits of communication. Thus the total 
communication needed for the first step is 0(kn 3 ' 2 log 2 n) bits. For the second step, the k sites first use 
the algorithm in ILTTTl to compute the degree of each vertex (this is essentially Fq) up to a factor of 2, using 
O(knlogn) bits of communication, and then they construct the set H = {v G V | degree(w) < 2y/n}. 
Next, P2, ■ ■ ■ ,Pk sen d P\ all edges that are incident to a vertex in H. Note that each of Pj (i = 2, . . . , k) 
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will send at most 0(n 3//2 ) edges to Pi. Therefore, the total communication cost of the second step is 
bounded by 0(kn log n + kn 3 ' 2 log n) = 0(kn 3 ' 2 log n) bits. We conclude with the following theorem for 
diameter. 

Theorem 5 There exists a randomized protocol that approximates the diameter of a graph up to an ad- 
ditive error of 2 in the message-passing model. The protocol succeeds with probability 0.99 and uses 
0{kn 3 ' 2 log n) bits of communication. 

7 Concluding Remarks 

In this paper we show that exact computation of many basic statistical and graph problems in the message- 
passing model are necessarily communication-inefficient. An important message we want to deliver through 
these negative results, which is also the main motivation of this paper, is that a relaxation of the problem, 
such as an approximation, is necessary in the distributed setting if we want communication-efficient pro- 
tocols. Besides approximation, the layout and the distribution of the input are also important factors for 
reducing communication. 

An interesting future direction is to further investigate efficient communication protocols for approx- 
imately computing statistical and graph problems in the message-passing model, and to explore realistic 
distributions and layouts of the inputs. 

One question which we have not discussed in this paper but is important for practice, is whether we can 
obtain round-efficient protocols that (almost) match the lower bounds which hold even for round-inefficient 
protocols? Most simple protocols presented in this paper only need a constant number of rounds, except the 
ones for bipartiteness and (approximate) diameter, where we need to grow BFS trees which are inherently 
sequential (require f2(A) rounds where A is the diameter of the graph). Using the sketching algorithm in 
l5l . we can obtain a 1-round protocol for bipartiteness that uses 0{kn) bits of communication. We do not 
know whether a round-efficient protocol exists for the additive-2 approximate diameter problem th at co uld 



(almost) match the 0(kn 3 ' 2 ) bits upper bound obtained by the round-inefficient protocol in Section 6.6 
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